{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Regression Against Time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Formation Process of Winners and Losers in Momentum Investing\n",
    "(https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2610571)\n",
    "\n",
    "> **p. 3**: Intermediate-term (3–12 months) momentum has been documented by Jegadeesh\n",
    "and Titman (1993, 2001, hereafter JT), while short-term (weekly) and long-term (3–5\n",
    "years) reversals have been documented by Lehmann (1990) and Jegadeesh (1990) and\n",
    "by DeBondt and Thaler (1985), respectively. Various models and theories have been\n",
    "proposed to explain the coexistence of intermediate-term momentum and long-term\n",
    "reversal. However, most studies have focused primarily on which stocks are winners\n",
    "or losers; they have paid little attention to how those stocks become winners or losers.\n",
    "This paper develops a model to analyze whether the movement of historical prices is\n",
    "related to future expected returns.\n",
    "\n",
    "> **p. 4**: This paper captures the idea that past returns and the formation process of past\n",
    "returns have a joint effect on future expected returns. We argue that how one stock\n",
    "becomes a winner or loser—that is, the movement of historical prices—plays an\n",
    "important role in momentum investing. Using a polynomial quadratic model to\n",
    "approximate the nonlinear pattern of historical prices, the model shows that as long as\n",
    "two stocks share the same return over the past n-month, the future expected return of\n",
    "the stock whose historical prices are convex shaped is not lower than one whose\n",
    "historical prices are concave shaped. In other words, when there are two winner (or\n",
    "loser) stocks, the one with convex-shaped historical prices will possess higher future\n",
    "expected returns than the one with concave-shaped historical prices.\n",
    "\n",
    "> **p. 4**: To test the model empirically, we regress previous daily prices in the ranking\n",
    "period on an ordinal time variable and the square of the ordinal time variable for each\n",
    "stock. The coefficient of the square of the ordinal time variable is denoted as $\\gamma$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!{sys.executable} -m pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cvxpy as cvx\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import time\n",
    "import os\n",
    "import quiz_helper\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "plt.style.use('ggplot')\n",
    "plt.rcParams['figure.figsize'] = (14, 8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### data bundle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import quiz_helper\n",
    "from zipline.data import bundles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ['ZIPLINE_ROOT'] = os.path.join(os.getcwd(), '..', '..','data','module_4_quizzes_eod')\n",
    "ingest_func = bundles.csvdir.csvdir_equities(['daily'], quiz_helper.EOD_BUNDLE_NAME)\n",
    "bundles.register(quiz_helper.EOD_BUNDLE_NAME, ingest_func)\n",
    "print('Data Registered')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build pipeline engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.pipeline import Pipeline\n",
    "from zipline.pipeline.factors import AverageDollarVolume\n",
    "from zipline.utils.calendars import get_calendar\n",
    "\n",
    "universe = AverageDollarVolume(window_length=120).top(500) \n",
    "trading_calendar = get_calendar('NYSE') \n",
    "bundle_data = bundles.load(quiz_helper.EOD_BUNDLE_NAME)\n",
    "engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View Data¶\n",
    "With the pipeline engine built, let's get the stocks at the end of the period in the universe we're using. We'll use these tickers to generate the returns data for the our risk model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "universe_end_date = pd.Timestamp('2016-01-05', tz='UTC')\n",
    "\n",
    "universe_tickers = engine\\\n",
    "    .run_pipeline(\n",
    "        Pipeline(screen=universe),\n",
    "        universe_end_date,\n",
    "        universe_end_date)\\\n",
    "    .index.get_level_values(1)\\\n",
    "    .values.tolist()\n",
    "    \n",
    "universe_tickers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Get Returns data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.data.data_portal import DataPortal\n",
    "\n",
    "data_portal = DataPortal(\n",
    "    bundle_data.asset_finder,\n",
    "    trading_calendar=trading_calendar,\n",
    "    first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,\n",
    "    equity_minute_reader=None,\n",
    "    equity_daily_reader=bundle_data.equity_daily_bar_reader,\n",
    "    adjustment_reader=bundle_data.adjustment_reader)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get pricing data helper function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from quiz_helper import get_pricing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## get pricing data into a dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "returns_df = \\\n",
    "    get_pricing(\n",
    "        data_portal,\n",
    "        trading_calendar,\n",
    "        universe_tickers,\n",
    "        universe_end_date - pd.DateOffset(years=5),\n",
    "        universe_end_date)\\\n",
    "    .pct_change()[1:].fillna(0) #convert prices into returns\n",
    "\n",
    "returns_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sector data helper function\n",
    "We'll create an object for you, which defines a sector for each stock.  The sectors are represented by integers.  We inherit from the Classifier class.  [Documentation for Classifier](https://www.quantopian.com/posts/pipeline-classifiers-are-here), and the [source code for Classifier](https://github.com/quantopian/zipline/blob/master/zipline/pipeline/classifiers/classifier.py)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.pipeline.classifiers import Classifier\n",
    "from zipline.utils.numpy_utils import int64_dtype\n",
    "class Sector(Classifier):\n",
    "    dtype = int64_dtype\n",
    "    window_length = 0\n",
    "    inputs = ()\n",
    "    missing_value = -1\n",
    "\n",
    "    def __init__(self):\n",
    "        self.data = np.load('../../data/project_4_sector/data.npy')\n",
    "\n",
    "    def _compute(self, arrays, dates, assets, mask):\n",
    "        return np.where(\n",
    "            mask,\n",
    "            self.data[assets],\n",
    "            self.missing_value,\n",
    "        )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sector = Sector()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## We'll use 2 years of data to calculate the factor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** Going back 2 years falls on a day when the market is closed. Pipeline package doesn't handle start or end dates that don't fall on days when the market is open. To fix this, we went back 2 extra days to fall on the next day when the market is open."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "factor_start_date = universe_end_date - pd.DateOffset(years=2, days=2)\n",
    "factor_start_date"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## describing price over time with a curve\n",
    "\n",
    "To describe price over time, we'll use integers that increment each day as the independent variable.  We'll use price as the dependent variable.  Let's practice how to regress the stock price against time and time squared.  This will allow us to describe the trajectory of price over time using a polynomial.\n",
    "\n",
    "$ ClosePrice_i = \\beta \\times time_i + \\gamma \\times time_i^2$\n",
    "\n",
    "First, we'll use `numpy.arange(days)` where days might be 5 for a week, or 252 for a year's worth of data.  So we'll have integers represent the days in this window.\n",
    "\n",
    "To create a 2D numpy array, we can combine them together in a list.  By default, the `numpy.arange` arrays are row vectors, so we use transpose to make them column vectors (one column for each independent variable).\n",
    "\n",
    "We instantiate a LinearRegression object, then call `.fit(X,y)`, passing in the independent and depend variables.  \n",
    "\n",
    "We'll use `.coefficient` to access the coefficients estimated from the data.  There is one for each independent variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# we're choosing a window of 5 days as an example\n",
    "X = np.array([np.arange(5), np.arange(5)**2])\n",
    "X = X.T\n",
    "X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#we're making up some numbers to represent the stock price\n",
    "y = np.array(np.random.random(5)*2)\n",
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LinearRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reg = LinearRegression()\n",
    "reg.fit(X,y);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz 1\n",
    "Output the estimates for $\\beta$ and $\\gamma$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO Output the estimates for Beta and gamma\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## outputs\n",
    "`outputs` is a class variable defined in CustomFactor class.  We'll set outputs to a list of strings, representing the member variables of the `out` object.\n",
    "\n",
    "* outputs (iterable[str], optional) – An iterable of strings which represent the names of each output this factor should compute and return. If this argument is not passed to the CustomFactor constructor, we look for a class-level attribute named outputs.\n",
    "\n",
    ">So for example, if we create a subclass that inherits from CustomFactor, we can define a class level variable `outputs = ['var1','var2']`, passing in the names of the variables as strings.\n",
    "\n",
    "Here's how this variable is used inside the `compute` function:\n",
    ">out : np.array[self.dtype, ndim=1]\n",
    "    Output array of the same shape as `assets`.  `compute` should write\n",
    "    its desired return values into `out`. If multiple outputs are\n",
    "    specified, `compute` should write its desired return values into\n",
    "    `out.<output_name>` for each output name in `self.outputs`.\n",
    "\n",
    "So if we define `outputs = ['var1', 'var2']`, then inside our `compute` function, we'll have `out.var1` and `out.var2` that are numpy arrays.  Each of these numpy arrays has one element for each stock that we're processing (this is done for us by the code we inherited from CustomFactor.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## numpy.isfinite\n",
    "\n",
    "Numpy has a way to check for `NaN` (not a number) using `numpy.isnan()`.  We can also check if a number is neither `NaN` nor infinite using `numpy.isfinite()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quiz 2: Regression against time\n",
    "\n",
    "We'll construct a class that inherits from CustomFactor, called `RegressionAgainstTime`.  It will perform a regression on one year's worth of daily data at a time.  If the stock price is either NaN or infinity (bad data, or an infinitely amazing company!), then we don't want to run it through a regression.\n",
    "\n",
    "**Hint:**  See how we do things for the beta variable, and you can do something similar for the gamma variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.pipeline.data import USEquityPricing\n",
    "from zipline.pipeline.factors import CustomFactor\n",
    "class RegressionAgainstTime(CustomFactor):\n",
    "\n",
    "    #TODO: choose a window length that spans one year's worth of trading days\n",
    "    window_length = # ...\n",
    "    \n",
    "    #TODO: use USEquityPricing's close price\n",
    "    inputs = # ...\n",
    "    \n",
    "    #TODO: set outputs to a list of strings, which are names of the outputs\n",
    "    #We're calculating regression coefficients for two independent variables, \n",
    "    # called beta and gamma\n",
    "    outputs = # [..., ...]\n",
    "    \n",
    "    def compute(self, today, assets, out, dependent):\n",
    "        \n",
    "        #TODO: define an independent variable that represents time from the start to end\n",
    "        # of the window length. E.g. [1,2,3...252]\n",
    "        t1 = # ...\n",
    "        \n",
    "        #TODO: define a second independent variable that represents time ^2\n",
    "        t2 = # ...\n",
    "        \n",
    "        # combine t1 and t2 into a 2D numpy array\n",
    "        X = # ...\n",
    "\n",
    "    \n",
    "        #TODO: the number of stocks is equal to the length of the \"out\" variable,\n",
    "        # because the \"out\" variable has one element for each stock\n",
    "        n_stocks = # ...\n",
    "        # loop over each asset\n",
    "\n",
    "        for i in range(n_stocks):\n",
    "            # TODO: \"dependent\" is a 2D numpy array that\n",
    "            # has one stock series in each column,\n",
    "            # and days are along the rows.\n",
    "            # set y equal to all rows for column i of \"dependent\"\n",
    "            y = # ...\n",
    "            \n",
    "            # TODO: run a regression only if all values of y\n",
    "            # are finite.\n",
    "            if # ... :\n",
    "                # create a LinearRegression object\n",
    "                regressor = LinearRegression()\n",
    "                \n",
    "                # TODO: fit the regressor on X and y\n",
    "                \n",
    "                \n",
    "                # store the beta coefficient\n",
    "                out.beta[i] = regressor.coef_[0]\n",
    "                \n",
    "                #TODO: store the gamma coefficient\n",
    "                # ...\n",
    "            else:\n",
    "                # store beta as not-a-number\n",
    "                out.beta[i] = np.nan\n",
    "                \n",
    "                # TODO: store gammas not-a-number\n",
    "                # ...\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quiz 3: Create conditional factor\n",
    "\n",
    "We can create the conditional factor as the product of beta and gamma factors.\n",
    "\n",
    "$ joint_{Factor} = \\beta_{Factor} \\times \\gamma_{Factor} $\n",
    "\n",
    "\n",
    "If you see the [documentation for the Factor class](https://www.zipline.io/appendix.html?highlight=customfactor#zipline.pipeline.factors.Factor):\n",
    "\n",
    "> Factors can be combined, both with other Factors and with scalar values, via any of the builtin mathematical operators (+, -, *, etc). This makes it easy to write complex expressions that combine multiple Factors. For example, constructing a Factor that computes the average of two other Factors is simply:\n",
    "\n",
    "```\n",
    "f1 = SomeFactor(...)  \n",
    "f2 = SomeOtherFactor(...)  \n",
    "average = (f1 + f2) / 2.0  \n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Example: we'll call the RegressionAgainstTime constructor,\n",
    "# pass in the \"universe\" variable as our mask, \n",
    "# and get the \"beta\" variable from that object.\n",
    "# Then we'll get the rank based on the beta value.\n",
    "beta_factor = (\n",
    "    RegressionAgainstTime(mask=universe).beta.\n",
    "    rank()\n",
    ")\n",
    "\n",
    "# TODO: similar to the beta factor,\n",
    "# We'll create the gamma factor\n",
    "gamma_factor = # ...\n",
    "\n",
    "# TODO: if we multiply the beta factor and gamma factor,\n",
    "# we can then rank that product to create the conditional factor\n",
    "conditional_factor = # ...\n",
    "\n",
    "p = Pipeline(screen=universe)\n",
    "# Add the beta, gamma and conditional factor to the pipeline\n",
    "p.add(beta_factor, 'time_beta')\n",
    "p.add(gamma_factor, 'time_gamma')\n",
    "p.add(conditional_factor, 'conditional_factor')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize the pipeline\n",
    "\n",
    "Note that you can right-click the image and view in a separate window if it's too small."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p.show_graph(format='png')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## run pipeline and view the factor data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = engine.run_pipeline(p, factor_start_date, universe_end_date)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize factor returns\n",
    "\n",
    "These are returns that a theoretical portfolio would have if its stock weights were determined by a single alpha factor's values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from quiz_helper import make_factor_plot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "make_factor_plot(df, data_portal, trading_calendar, factor_start_date, universe_end_date);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Solution\n",
    "[The solution notebook is here](regression_against_time_solution.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
